Genomic heterogeneity of multiple synchronous lung cancers in Chinese population

Abstract Introduction It is clinically challenging to infer the phylogenetic relationship between different tumor lesions of patients with multiple synchronous lung cancers (MSLC), whether these lesions are the result of independently evolved tumor or intrapulmonary metastases. Methods We used the Illumina X10 platform to sequence 128 stage I lung cancer samples collected from 64 patients with MSLC. All samples were analyzed for mutation spectra and phylogenetic inference. Results We detected genetic aberrations within genes previously reported to be recurrently altered in lung adenocarcinoma including, EGFR, ERBB2, TP53, BRAF, and KRAS. Other putative driver mutations identified were enriched in RTK‐RAS signaling, TP53 signaling, and cell cycle. Also, we found some interesting cases, two cases that carried EGFR L858R and T790M co‐mutation in one tumor and another tumor with only EGFR 19del, and 1 case with two KRAS hotspots in the same tumor. Due to the short follow‐up time and early stage, further investigation is needed to determine whether this unique mutation profile will affect their progression‐free survival (PFS) and overall survival (OS). Regarding genetic evolution analysis among 64 tumor samples, 50 of them display distinct mutational profiles, suggesting these are independently evolved tumors, which is consistent with histopathological assessment. On the other hand, six patients were identified to be intrapulmonary metastasis as the mutations harbored in different lesions are clonally related. Conclusion In summary, unlike intrapulmonary metastases, patients with MSLC harbor distinct genomic profiles in different tumor lesions, and we could distinguish MSLC from intrapulmonary metastases via clonality estimation.


| INTRODUCTION
Lung cancer is the leading cause of global cancer mortality.In recent years, advancements in early detection and diagnostic techniques have led to an increased identification of nodules.Multiple primary lung cancers (MPLCs) refer to the occurrence of multiple primary tumors in different areas of the lungs within a single patient, which can be synchronous or metachronous. 1The prevalence of multiple synchronous lung cancers (MSLC) has been reported to range from 0.2% to 8% among lung cancer cases, [2][3][4][5] distinguishing them from pulmonary metastases.Treatment plans for MPLCs depend on whether they are MSLCs or metastatic lung cancers and their TNM staging. 6urrently, there is no universally accepted standard for diagnosing and treating MSLCs; however, most clinicians refer to the criteria published by Martini and Melamed in 1975. 1 In 2003, the American College of Chest Physicians (ACCP) suggested that molecular markers could aid in distinguishing genomic features when tumor foci share the same histological subtype. 74][15][16] Ramsey Asmar et al. found concordance in oncogenic driver mutations (e.g., EGFR, KRAS) between primary tumors and metastasis 17 ; conversely Yu Liu et al. highlighted genomic heterogeneity within MSLC cases. 13Snehal B. Patel et al. 18 confirmed this view by 50 gene panel in 2017: they observed high mutational concordance between primary-metastatic pairs with consistent driver mutations but distinct mutational characteristics among MSLC cases.Several similar studies support these findings. 11,12,19,20owever, the consistency of Histologic Prediction via CT imaging with genomic features remains controversial due to the lack of benchmarking against molecular approaches.A study by Chang et al. found that 22% (17 of 76) patients showed discordant results between prospective histologic prediction and final molecular classification.Furthermore, the discordance rate was significantly higher for intrapulmonary metastasis (IPMs) (11/25, 44%) than MSLCs (6/51, 12%). 21Till now, in clinical practice, gene markers have not been widely used in diagnosing MSLC due to the absence of reliable markers and algorisms to determine whether two or more tumor foci are of different origins.In this study, we analyzed 64 patients including 128 samples using NGS targeted gene sequencing to explore their mutation profiles and developed a clonality index (CI) through a mathematical algorithm to distinguish MSLC from intrapulmonary metastasis.

| Patients
This clinical trial was initiated in 2018, during which we sequenced a total of 128 stage I lung cancer samples collected from 64 patients with MSLCs using the Illumina X10 platform between October 2018 and October 2019.Resectable specimens were obtained for High Throughput Sequencing (HTS) along with matched blood normal controls.All patients were diagnosed with multiple synchronous lung adenocarcinomas by pathologists at the first visit.None of the patients had extrathoracic metastasis or received neoadjuvant therapies before surgery, and their clinical data, including age, sex, smoking history, tumor pathological information, and tumor biomarker information was collected.Each patient had only two lesions.Each patient had only two lesions that shared similar histologic type, radiographic appearance and growth patterns; thus they could not be diagnosed as either MSLC or intrapulmonary metastases.
This study was approved by the Local Ethics Committee at the First Affiliated Hospital of Dalian (YJ-KY-FB-2019-18), and written informed consent was obtained from the patient.

| Sample collection and genomic profiling
Tumor tissues were snap-frozen and stored in liquid nitrogen after collection; Tumor tissue samples were less than 1.5 cm × 1.5 cm in size with a thickness ranging from 2 to 5 mm.Genomic DNA was extracted from fresh-frozen samples using QIAamp DNA Mini kit (Qiagen, 51306, Valencia, CA, USA) according to the manufacturer' s protocol, and fragmented for constructing a library using KAPA Hyper Prep kits (KAPA, KK8504) and captured using the Agilent SureSelect XT Human All Exon v5 kit (Agilent Technologies, Santa Clara, CA, USA).Using Genetron's 509 cancer panel, exons of major cancer genes of all tumor tissue and corresponding WBC were sequenced on Illumina NovaSeq6000 provided by Genetron Health.

| Statistical analysis
Both tumor DNA samples and their matched normal DNA samples were subjected to standardized computational workflow for analysis: Raw sequence data (FastQ format) underwent trimming and filtering using Trimmomatic 0.33 using the following parameters:1.
ILLUMINACLIP:TruSeq3-PE-adapter.fa:2:30:10:8:true; 2. TRAILING:3; 3. SLIDINGWINDOW:4:15; 4. MINLEN:36. 22Paired-end clean reads then were aligned to the human reference genome (hg19) using the Burrows-Wheeler Aligner (BWA, version 0.7.10-r789) with default parameters. 23Duplicate removal, local realignment, and base quality recalibration were performed using PICARD (http:// broad insti tute.github.io/ picard/ , version 1.103) and the Genome Analysis Toolkit (GATK, version 3.1-0-g72492bb). 24Somatic single nucleotide variations (SNVs) were called using Mutect (version 3.1-0-g72492bb), 25 incorporating the Catalog of Somatic Mutations in Cancer (COSMIC) v54 and dbSNP138 as reference sets of known somatic and germline mutations, respectively.Small indels were identified through strelka (version 1.0.14)employing defaults parameters with BAM as input. 26The effects of variants were annotated using Variant Effect Predictor (VEP).CREST was utilized for detection of somatic structural variants (SVs), while Control-FREEC was employed for analysis of somatic copy-number variants including loss of heterozygosity (LOH).The clonality index (CI) was calculated according to previous methods. 27or Calculation of CI, we leveraged genomic data of lung adenocarcinoma samples in TCGA to construct positive (IPM) and negative (MPLC) dataset.In the positive dataset, we randomly selected a proportion of somatic mutations from each lung adenocarcinoma sample twice (with replacement) to generate pairs of mutation sets.For each shared mutation, we modeled the probability of observing it in both samples as binomial distribution , where p is the population prevalence of such mutation in lung adenocarcinoma cohort in TCGA.Therefore, the probability of observing a given set of M identical mutations in the two samples is given by ∏ M m=1 P(X ) m .The CI is defined as CI = − log 10 ∏ M m=1 P(X ) m .In contrast, for the negative dataset, pairs of mutation sets were generated from different samples instead of the same sample.The CI was subsequently calculated through same procedure.CIs obtained from positive to negative datasets represent the clonality of IPM and MPLC respectively.Optimum CI cut-off was then determined according to AUROC curve analysis.
To objectively establish a threshold for clonal relatedness, we utilized the mutational data from 514 unrelated Lung adenocarcinoma from TCGA.As a positive control (i.e., clonally related) to simulate heterogeneity between biologically related samples, we randomly selected 40%, 60%, and 80% of the set of mutations from the pool of 505 unrelated samples.Conversely, as a negative control (i.e., unrelated), we randomly selected an equivalent number of pairs (i.e., 3 × 505 = 1515) of unrelated samples from TCGA.

| Patient
A total of 64 patients at the First Affiliated Hospital of Dalian Medical University between July 2018 and April 2020 were identified based on multiple primary lung cancer by clinical-pathologic criteria.The clinical information of these patients is summarized in Table 1 and Table S1.All patients presented with two histologically distinct tumor lesions.The median age was 56 years (ranging from 29 to 75 years), 53 of them were non-smokers.Among all these patients, 27 (42%) patients had both tumors in the same lobe, 37 (58%) in different lobes.For the 128 tumors, all of them were diagnosed as adenocarcinoma.None of the enrolled patients received neoadjuvant chemotherapy or any other anticancer therapy prior to surgery.

| The genetic landscape of MSLC
We analyzed the mutation pattern of 128 samples and identified a total of 292 genetic alterations, including 290 somatic, non-synonymous mutations, one amplification and one fusion.The average number of mutations per tumor was 0.95 (ranging from 0 to 15.08).Mutations were detected in at least one tumor in 118 tumors (92.2%), while no mutation was detected in the remaining 10 tumors (7.8%).The mutational landscape is shown in Figure 1.

| Genomic heterogeneity of multiple synchronous lung cancers
The driver mutation profiles of different tumor foci in the same individuals were compared to each other.Based on whether it contains the same driver mutation among different tumors in the same individual, 64 patients were separated into 3 groups.Group 1 comprised 10 patients who had one tumor without any detected mutations.Group 2 included individuals who either had identical The mutation landscape of 128 samples.
driver mutations or no driver mutation but exhibited other gene mutations that were shared between different tumors.Samples in group 3 displayed distinct driver mutations or no other shared gene mutations within an individual.There were 10 (15.6%), 14 (21.9%), and 40 (62.5%)patients in group 1, 2, and 3 separately (Figure 2).In group 1 and 3, for there were no same mutations in the two separated tumors, there was concordance with the conclusions drawn from pathology and imaging evidence; however, for individuals in group 2 who carried either the same driver mutation or other shared mutations (without a driver mutation), further investigation is required to determine their origin.

MSLC with intrapulmonary metastasis
For the 14 patients in group 2, which shared gene mutation between different tumors in one individual, we defined a clonality index (CI) to differentiate MSLC with intrapulmonary metastasis.This index assesses the likelihood of co-occurring mutations beyond chance, considering all somatic synonymous and non-synonymous alterations.The result of these 14 patients and the cut-off for CI is shown in Table 2.Among them, six patients have CI dramatically above the cut-off which means these tumors might be intrapulmonary metastasis.In addition to common driver mutations, these individuals also exhibited shared passenger mutations across their tumors.The remaining eight patients had CI close to the cut-off threshold (7.77,Table 2), necessitating further confirmation through integration of additional clinical information.Notably, the shared mutations observed in these cases were limited to hotspot regions within driver gene.

| DISCUSSION
In recent years, there has been an increasing incidence of multiple synchronous lung cancer (MSLC) cases being diagnosed.MSLC generally has a better prognosis.However, distinguishing it from metastatic lung cancer remains a challenge in clinical practice.Many clinical cases cannot be solely diagnosed based on tumor location and histological features alone.With the recent advancements in genomic technology, we aimed to characterize the genomic features of individual tumor foci in MSLC and explore the potential use of genomic markers to facilitate its diagnosis.
There have been several reports suggesting the heterogeneity of different tumor foci in MSLC. 11,12,28,29Yu Liu et al. conducted a genomic analysis of 15 lung adenocarcinomas and one regional lymph node metastasis from 6 patients, revealing distinct genomic profiles in all 15 lung tumors. 11Another study utilized a combined histo-molecular algorithm based on 22 hotspot genes to classify MSLC with intrapulmonary metastasis and guide adjuvant treatment decisions. 30A recent study performed targeted sequencing of a 464-gene panel in 16 patients with multiple tumors, demonstrating the utility of DNA sequencing when pathology is inadequate to make a conclusive diagnosis. 31However, these studies had limited sample sizes or focused on fewer genes.In our study, we employed NGS sequencing to investigate the 509 genes across different tumors from 64 patients with multiple synchronous lung cancer at initial diagnosis.Our genomic study suggested that in 50 patients, different tumor foci do not have shared mutations, suggesting the tumor foci are of independent origin and are likely synchronous multiple primary lung cancers.The genotyping results align with histological and imagingbased conclusions.From the perspective of evolutionary biology, tumorigenesis follows Darwin's theory of natural selection, choosing a suitable way in both expansions and constraints.The continual appearance of random mutagenesis and universal selection of positive and negative induce the distinct tumorigenic alterations of each lesion.Sequencing technologies can effectively identify the clonality among multiple lesions within the same patient affected by certain genetic changes to distinguish the primary lesions from the metastatic ones.As evidenced by previous studies, the molecular spectrum of independent clones exhibits significant heterogeneity, and lesions sharing a high degree of great similarity in the mutation spectrum tend to be IMs.
To distinguish MPLC and IPM, the Clonality index (CI) is calculated to determine whether the likelihood of two samples derived from individual patient sharing the same mutations beyond expected co-occurrence by chance.Six patients exhibited a significantly higher CI above the cutoff, indicating that the foci are likely metastasis.Eight patients had CI close to cut-off, suggesting that although they may have originated from different sources initially, they evolved in a similar direction under comparable selection pressure.Based on our data, we propose a flow diagram for classification of MSLC using next-generation sequencing and CI2 (Figure 3).Overall, retrospective histologic reassessment of cases unambiguously classified by NGS will provide an unprecedented benchmark against which to refine histologic criteria for distinguishing MSLC and IPMs in future studies.
Additionally, there were some interesting cases in the study.We found 2 cases (Case 73 and Case 57) had primary EGFR T790M mutation.According to the previous report, which focused on stage III-IV patients, the results demonstrated that de novo T790M mutation was more likely to carry concurrent EGFR L858R mutation, 32,33 resulting in a favorable response to osimertinib treatment.In our cohort, the only 2 cases harboring primary EGFR T790M mutation were co-mutated with EGFR  L858R, indicating simultaneous occurrence of these two mutations at similar frequencies.Furthermore, a metaanalysis encompassing data from 22 clinical trials and reported that de novo EGFR T790M was an unfavorable predictive and prognostic marker in the late stage lung cancer. 34However, there was no research focus on stage I lung cancer, and these early stage patients will not receive any treatment after surgery, whether the primary EGFR T790M will be an unfavorable prognostic marker need further investigation.Another unique case (Case 31) is a 56 years woman, one of the tumor carried two activation site (G12D and G12V), and the mutation frequency was 5.7% and 7.8%, respectively.To the best of our knowledge, there are no existing studies reporting such phenomena; therefore follow-up results are necessary to ascertain whether this tumor exhibits increased aggressiveness.
Our study still had some deficiencies, for it was a single center study and the follow-up data was not available at this time.The sample size needs to be increased and intrapulmonary metastasis data would better be included.For the application of CI, we constructed positive and negative datasets using genomic data in TCGA which were dominated by Western individuals.The difference in population prevalence of some driver mutations between western population and eastern population may introduce bias into the model and cause inaccuracy of CI calculation when applied to Chinese population.Furthermore, to our knowledge, CI was only used in gynecological tumors like Synchronous Endometrioid Endometrial and Ovarian Carcinomas.Therefore, we should accumulate more data for lung cancer to optimize the application of CI.
Our results showed that the conclusions drawn from genotyping results are highly consistent with that drew from histology and imaging information, indicating that genomic markers can serve as valuable tools for facilitating clinical diagnosis.With the rapid development of genomic and sequencing technologies in recent years, the use of genomic markers has become increasingly prevalent in diagnostic settings.In particular, these markers can be useful in diagnosis and guide treatment plans for many diseases such as breast cancer.Our study here provides evidence that the incorporating genome profiling in addition to traditional histological and imaging approaches can provide more comprehensive analyses for the diagnosis of MSLC.
Clinical characteristics of 64 patients.
T A B L E 1Abbreviations: LLL, left lower lobe; LUL, left upper lobe; RLL, right lower lobe; RML, right middle lobe; RUL, right upper lobe.
The clonality index of 14 individuals.